Search CORE

15 research outputs found

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Author: A. Austermann
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Schuller
B. Yang
C. Busso
C. M. Lee
C. Nass
D. Bitouk
D. J. C. MacKay
D. Ververidis
D. Ververidis
D. Watson
E. Benetos
E. Benetos
E. Fersini
E. I. Konstantinidis
F. Burkhardt
F. Burkhardt
Fabio Paternò
H. Altun
H. Gunes
H. K. Mishra
H. Mixdorff
H. P. Espinosa
I. Guyon
I. Guyon
I. R. Murray
J. D. Markel
J. Hirschberg
J. Pittermann
K. Dai
K. R. Scherer
L. B. Jackson
M. Ayadi El
M. Kotti
M. Kotti
M. M. Sondhi
M. Pantic
M. Pantic
Margarita Kotti
N. Sato
N. Vanello
P. Boersma
P. Ekman
P. Ekman
P. N. Juslin
P. Ruvolo
P. Zervas
R. A. Calvo
R. Cowie
R. Tato
R. W. Picard
S. Chandaka
S. Ntalampiras
T. Iliou
T. L. Pao
T. P. Kostoulas
T. Vogt
W. Bosma
W. Minker
Z. Inanoglu
Z. Zeng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2012
Field of study

In this paper, a psychologically-inspired binary cascade classification schema is proposed for speech emotion recognition. Performance is enhanced because commonly confused pairs of emotions are distinguishable from one another. Extracted features are related to statistics of pitch, formants, and energy contours, as well as spectrum, cepstrum, perceptual and temporal features, autocorrelation, MPEG-7 descriptors, Fujisakis model parameters, voice quality, jitter, and shimmer. Selected features are fed as input to K nearest neighborhood classifier and to support vector machines. Two kernels are tested for the latter: Linear and Gaussian radial basis function. The recently proposed speaker-independent experimental protocol is tested on the Berlin emotional speech database for each gender separately. The best emotion recognition accuracy, achieved by support vector machines with linear kernel, equals 87.7%, outperforming state-of-the-art approaches. Statistical analysis is first carried out with respect to the classifiers error rates and then to evaluate the information expressed by the classifiers confusion matrices. © Springer Science+Business Media, LLC 2011

Crossref

Spiral - Imperial College Digital Repository

Extraction, Analysis and Synthesis of Fujisaki model Parameters

Author: A. V. Isačenko
E. Stock
H. Fujisaki
H. Fujisaki
H. Mixdorff
H. Mixdorff
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Hearing and seeing beats: The influence of visual beats on the production and perception of prominence

Author: Hoffmann R.
Krahmer E.J.
Mixdorff H.
Swerts M.G.J.
Publication venue: TUDpress
Publication date: 01/01/2006
Field of study

A Study on the Perception of Tone and Intonation in Sesotho

Author: Machobane M
Mixdorff H
Mohasi L
Niesler TR
Publication venue
Publication date: 01/01/2011
Field of study

Please help us populate SUNScholar with the post print version of this article. It can be e-mailed to: [email protected] En Elektroniese Ingeni

Stellenbosch University SUNScholar Repository

Directions for the future of technology in pronunciation research and teaching

Author: Cucchiarini C.
Derwing T.M.
Foote J.A.
Hardison D.M.
Levis G.M.
Mixdorff H.
O’Brien M.M.
Strik H.
Thomson R.I.
Publication venue
Publication date: 01/01/2019
Field of study

Contains fulltext : 199273.pdf (publisher's version ) (Open Access)25 p

Radboud Repository

Manipulating uncertainty: The contribution of different audiovisual prosodic cues to the perception of confidence

Author: Dijkstra C.
Hoffmann R.
Krahmer E.J.
Mixdorff H.
Swerts M.G.J.
Publication venue: TUDpress
Publication date: 01/01/2006
Field of study

Audio-visual expressions of attitude: How many different attitudes can perceivers decode?

Author: Hönemann Angelika
Lee Tan
Ma Matthew K. H.
Mixdorff Hansjoerg
Rilliard Albert
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Mixdorff H, Hönemann A, Rilliard A, Lee T, Ma MKH. Audio-visual expressions of attitude: How many different attitudes can perceivers decode? SPEECH COMMUNICATION. 2017;95:114-126.Based on the paradigm by Rilliard et al. we collected audio-visual expressions of attitudes such as arrogance, irony, sincerity and politeness in German. In the experimental design subjects are immersed in sixteen different communicative situations in which they are supposed to portray a certain attitude in a short dialog. Attitudes can be propositional, that is, reactions to a factual situation and/or social, that is, with respect to the relationship with the collocutor. Furthermore, attitudes can be of positive or negative valence or neutral. Undeniably there is a large repertory of subtle differences in the way certain talkers express certain attitudes. The important question is, however, whether collocutors either from the same language or a different one can actually decode these attitudes reliably. On that account we carried out three perceptual experiments in which we presented our recordings of the portrayed attitudes audiovisually, audio -only and video -only. In the first study, German perceivers rated the expressions given the intended attitude, in the second study, they had to choose the most suitable in a choice of five attitudes, and in the third study raters were able to assign freely the term best matching each attitudinal expression. This last experiment was recently replicated by native speakers of Cantonese in Hong Kong. The current article reviews and reevaluates the results from the first three experiments with the German subjects under the premise that perceivers actually have a more limited set of attitudinal registers which they can reliably draw on. This means that expressions can be sorted into a much smaller number of categories than the projected sixteen. In addition we compare and contrast these resulting clusters with the new data from the Cantonese speaking group. Our results indicate indeed a small number of readily decoded attitudes forming four clusters depending on the experiment design" - which are also distinct acoustically. Clusters from the statistical analysis are very similar for the German and the Cantonese perceivers and overlap with basic emotions. This result suggests that expressions of attitudes with low identification rates are more complex to decode and require more pragmatic information, that is, more contextual and possibly idiosyncratic information to be interpreted correctly. (C) 2017 Elsevier B.V. All rights reserved

Publications at Bielefeld University

An Eye-Tracking Study on Audiovisual Speech Perception Strategies Adopted by Normal-Hearing and Deaf Adults Under Different Language Familiarities

Author: Abdilbar Mamat
Argyle M.
Barone P.
Fuster-Duran A.
Haspelmath M.
Jianrong Wang
Jianwu Dang
Ju Zhang
Mei Yu
Mixdorff H.
Mixdorff H.
Ronnberg J.
Rossano F.
Wang Y.
Yu Chen
Yumeng Zhu
Publication venue: 'American Speech Language Hearing Association'
Publication date
Field of study

Crossref

Entwicklung einer Prosodiesteuerung fuer die Sprachsynthese in hoher Qualitaet zum Einsatz in Text-to-Speech-Systemen Abschlussbericht

Author: Mehnert D.
Mixdorff H.
Technische Univ. Dresden (Germany). Fakultaet Elektrotechnik
Technische Univ. Dresden (Germany). Inst. fuer Technische Akustik
Publication venue
Publication date: 01/01/1998
Field of study

SIGLEAvailable from TIB Hannover: F99B118 / FIZ - Fachinformationszzentrum Karlsruhe / TIB - Technische InformationsbibliothekDeutsche Forschungsgemeinschaft (DFG), Bonn (Germany)DEGerman

OpenGrey Repository